我们研究有限混合物中学习非参数分布的问题,并在样品复杂性上建立紧密的界限,以学习此类模型中的组件分布。也就是说,我们得到了I.I.D.来自pdf $ f $ whene $$ f = \ sum_ {i = 1}^k w_i f_i,\ quad \ sum_ {i = 1}^k w_i = 1,\ quad w_i> 0 $$的样品在学习每个组件$ f_i $时。没有关于$ f_i $的任何假设,此问题是错误的。为了识别组件$ f_i $,我们假设每个$ f_i $都可以写为高斯的卷积和紧凑的密度密度$ \ nu_i $,带有$ \ text {supp {supp}(\ nu_i)\ cap \ text \ text {supp}(\ nu_j)= \ emptyset $。我们的主要结果表明,$(\ frac {1} {\ varepsilon})^{\ omega(\ log \ log \ log \ frac {1} {\ varepsilon})} $ samples $ samples是估计每个$ f_i $的样本所必需的。与参数混合物不同,难度不是源于$ k $或小重量$ w_i $的订单,并且与非参数密度估计不同,它不是源于维度,不规则性或不均匀性的诅咒。证明依赖于与高斯人的近似值的快速率,这可能是独立的。要证明这很紧,我们还提出了一种算法,该算法使用$(\ frac {1} {\ varepsilon})^{o(\ log \ log \ log \ frac {1} {\ varepsilon} {\ varepsilon}} $ sample f_i $。与基于力矩匹配和张量方法学习潜在变量模型的现有方法不同,我们的证明涉及通过正交功能对不良条件线性系统进行微妙的分析。结合了这些界限,我们得出结论,该问题的最佳样本复杂性正确在于多项式和指数之间,这在学习理论中并不常见。
translated by 谷歌翻译
给定点设置$ p \ subset \ mathbb {r} ^ $,$ p $的内核密度估计被定义为\ [\ overline {\ mathcal {g}} _ p(x)= \ frac {1} {\ left | p \ light |} \ sum_ {p \ in p} e ^ { - \ ltver \ ltvert xp \ light \ rvert ^ 2}为任何$ x \ in \ mathbb {r} ^ d $。我们研究如何构建一个小额Q $ q $,使得$ p $的内核密度估计是由$ q $的内核密度估计近似。此子集$ q $称为coreset。这项工作中的主要技术是在差异理论上设定$ p $的$ \ PM 1 $焦虑,我们利用BANASZCZYK定理。当$ d> 1 $是一个常数时,我们的施工给出了一个尺寸的尺寸$ o \ o \ lex(\ frac {1} {\ varepsilon} \右),而不是$ o lex的最熟知的结果(\Frac {1} {\ varepsilon} \ sqrt {\ log \ frac {1} {\ varepsilon}} \右)$。它是第一个结果,在$ \ sqrt {\ log} $ factor的屏障上突破即使是d = 2 $。
translated by 谷歌翻译
这项工作介绍了卫报天使,这是一个Android应用程序,可帮助视力障碍者避免在复杂的交通环境中危险。该系统由预贴Yolo模型,距离估计和移动方向估算组成,提供有关周围车辆的信息,并警报使用潜在危险的使用者,而无需昂贵的特殊用途设备。通过对8个受试者进行的实验,我们证实,就使用智能手机的应用程序的帮助而言,在行人交叉实验中的满意度得分比没有99%置信度以下的情况下的情况更好。但是,在我们的系统的帮助下,越过道路所需的时间平均要短,但是我们的实验并没有达到显着差异。该应用程序已在Google Play商店中发布,免费向公众开放。
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Existing measures and representations for trajectories have two longstanding fundamental shortcomings, i.e., they are computationally expensive and they can not guarantee the `uniqueness' property of a distance function: dist(X,Y) = 0 if and only if X=Y, where $X$ and $Y$ are two trajectories. This paper proposes a simple yet powerful way to represent trajectories and measure the similarity between two trajectories using a distributional kernel to address these shortcomings. It is a principled approach based on kernel mean embedding which has a strong theoretical underpinning. It has three distinctive features in comparison with existing approaches. (1) A distributional kernel is used for the very first time for trajectory representation and similarity measurement. (2) It does not rely on point-to-point distances which are used in most existing distances for trajectories. (3) It requires no learning, unlike existing learning and deep learning approaches. We show the generality of this new approach in three applications: (a) trajectory anomaly detection, (b) anomalous sub-trajectory detection, and (c) trajectory pattern mining. We identify that the distributional kernel has (i) a unique data-dependent property and the above uniqueness property which are the key factors that lead to its superior task-specific performance; and (ii) runtime orders of magnitude faster than existing distance measures.
translated by 谷歌翻译
Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
translated by 谷歌翻译
Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
translated by 谷歌翻译
Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. Specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representation. Besides, the inherent temporal relations are captured by aligning video-text pairs as a whole in different time resolutions with multi-modal temporal relation exploration task. Furthermore, we introduce the shuffling test to evaluate the temporal reliance of datasets and video-language pre-training models. We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e.g., SSv2-Template and SSv2-Label) with 8.6% and 11.1% improvement respectively. HiTeA also demonstrates strong generalization ability when directly transferred to downstream tasks in a zero-shot manner. Models and demo will be available on ModelScope.
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
translated by 谷歌翻译